Evaluating Topic Modeling as Preprocessing for a Sentiment Analysis Task

نویسنده

  • John Roesler
چکیده

Classifying the sentiment of documents is a well-studied problem in Natural Language Processing (NLP). The existence of excellent discriminative classifiers like Maxent has pushed the main body of research in the direction of feature engineering. In this paper, I examine an unusual class of features, the document-topic proportions assigned by the Latent Dirichlet Allocation topic model. In particular, I evaluate the relative performance of topic proportions alone, topic proportions together with unigrams, and unigrams alone. My findings are that topic proportions do not add enough information to significantly improve classification performance. I also perform domain adaptation experiments, finding that (although topic proportions do not outperform unigram features) there is some promise in classification based on topic proportions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tweester at SemEval-2016 Task 4: Sentiment Analysis in Twitter Using Semantic-Affective Model Adaptation

We describe our submission to SemEval2016 Task 4: Sentiment Analysis in Twitter. The proposed system ranked first for the subtask B. Our system comprises of multiple independent models such as neural networks, semantic-affective models and topic modeling that are combined in a probabilistic way. The novelty of the system is the employment of a topic modeling approach in order to adapt the seman...

متن کامل

A Topic based Approach for Sentiment Analysis on Twitter Data

Twitter has grown in popularity during the past decades. It is now used by millions of users who share information about their daily life and their feelings. In order to automatically process and analyze these data, applications can rely on analysis methods such as sentiment analysis and topic modeling. This paper contributes to the sentiment analysis research field. First, the preprocessing st...

متن کامل

Improving Twitter Sentiment Analysis with Topic-Based Mixture Modeling and Semi-Supervised Training

In this paper, we present multiple approaches to improve sentiment analysis on Twitter data. We first establish a state-of-the-art baseline with a rich feature set. Then we build a topic-based sentiment mixture model with topic-specific data in a semi-supervised training framework. The topic information is generated through topic modeling based on an efficient implementation of Latent Dirichlet...

متن کامل

Sentiment Analysis using neural architectures

Most sentiment analysis approaches are based on heavy preprocessing of the data which involves carefully choosing the right features based on the nature of the data, intuitive analysis and factors like language. I explore an unconventional approach to sentiment analysis based on recently proposed neural network architectures for NLP that parallel traditional well-performing approaches, both in ...

متن کامل

Joint Sentiment/Topic Modeling on Text Data Using Boosted Restricted Boltzmann Machine

Recently by the development of the Internet and the Web, different types of social media such as web blogs become an immense source of text data. Through the processing of these data, it is possible to discover practical information about different topics, individuals opinions and a thorough understanding of the society. Therefore, applying models which can automatically extract the subjective ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012